Serveur d'exploration sur l'OCR

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Validation of image defect models for optical character recognition

Identifieur interne : 002863 ( Main/Exploration ); précédent : 002862; suivant : 002864

Validation of image defect models for optical character recognition

Auteurs : Y. Li [États-Unis] ; D. Lopresti ; George Nagy (informaticien) [États-Unis] ; A. Tomkins

Source :

RBID : Pascal:96-0139394

Descripteurs français

English descriptors

Abstract

In this paper, we consider the problem of evaluating character image generators that model distortions encountered in optical character recognition (OCR). While a number of such defect models have been proposed, the contention that they produce the desired result is typically argued in an ad hoc and informal way. We introduce a rigorous and more pragmatic definition of when a model is accurate: we say a defect model is validated if the OCR errors induced by the model are indistinguishable from the errors encountered when using real scanned documents. We describe four measures to quantify this similarity, and compare and contrast them using over ten million scanned and synthesized characters in three fonts. The measures differentiate effectively between different fonts and different scans of the same font regardless of the underlying text.


Affiliations:


Links toward previous steps (curation, corpus...)


Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en" level="a">Validation of image defect models for optical character recognition</title>
<author>
<name sortKey="Li, Y" sort="Li, Y" uniqKey="Li Y" first="Y." last="Li">Y. Li</name>
<affiliation wicri:level="2">
<inist:fA14 i1="01">
<s1>GARI Software</s1>
<s2>Livingston NJ</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
</inist:fA14>
<country>États-Unis</country>
<placeName>
<region type="state">Géorgie (États-Unis)</region>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Lopresti, D" sort="Lopresti, D" uniqKey="Lopresti D" first="D." last="Lopresti">D. Lopresti</name>
</author>
<author>
<name sortKey="Nagy, G" sort="Nagy, G" uniqKey="Nagy G" first="G." last="Nagy">George Nagy (informaticien)</name>
<affiliation>
<country>États-Unis</country>
<placeName>
<settlement type="city">Troy (New York</settlement>
<region type="state">État de New York</region>
</placeName>
<orgName type="lab" n="5">Institut polytechnique Rensselaer</orgName>
</affiliation>
</author>
<author>
<name sortKey="Tomkins, A" sort="Tomkins, A" uniqKey="Tomkins A" first="A." last="Tomkins">A. Tomkins</name>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">INIST</idno>
<idno type="inist">96-0139394</idno>
<date when="1996">1996</date>
<idno type="stanalyst">PASCAL 96-0139394 EI</idno>
<idno type="RBID">Pascal:96-0139394</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000A17</idno>
<idno type="wicri:Area/PascalFrancis/Curation">000981</idno>
<idno type="wicri:Area/PascalFrancis/Checkpoint">000921</idno>
<idno type="wicri:doubleKey">0162-8828:1996:Li Y:validation:of:image</idno>
<idno type="wicri:Area/Main/Merge">002A08</idno>
<idno type="wicri:Area/Main/Curation">002863</idno>
<idno type="wicri:Area/Main/Exploration">002863</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a">Validation of image defect models for optical character recognition</title>
<author>
<name sortKey="Li, Y" sort="Li, Y" uniqKey="Li Y" first="Y." last="Li">Y. Li</name>
<affiliation wicri:level="2">
<inist:fA14 i1="01">
<s1>GARI Software</s1>
<s2>Livingston NJ</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
</inist:fA14>
<country>États-Unis</country>
<placeName>
<region type="state">Géorgie (États-Unis)</region>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Lopresti, D" sort="Lopresti, D" uniqKey="Lopresti D" first="D." last="Lopresti">D. Lopresti</name>
</author>
<author>
<name sortKey="Nagy, G" sort="Nagy, G" uniqKey="Nagy G" first="G." last="Nagy">George Nagy (informaticien)</name>
<affiliation>
<country>États-Unis</country>
<placeName>
<settlement type="city">Troy (New York</settlement>
<region type="state">État de New York</region>
</placeName>
<orgName type="lab" n="5">Institut polytechnique Rensselaer</orgName>
</affiliation>
</author>
<author>
<name sortKey="Tomkins, A" sort="Tomkins, A" uniqKey="Tomkins A" first="A." last="Tomkins">A. Tomkins</name>
</author>
</analytic>
<series>
<title level="j" type="main">IEEE Transactions on Pattern Analysis and Machine Intelligence</title>
<title level="j" type="abbreviated">IEEE Trans Pattern Anal Mach Intell</title>
<idno type="ISSN">0162-8828</idno>
<imprint>
<date when="1996">1996</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt>
<title level="j" type="main">IEEE Transactions on Pattern Analysis and Machine Intelligence</title>
<title level="j" type="abbreviated">IEEE Trans Pattern Anal Mach Intell</title>
<idno type="ISSN">0162-8828</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Character image generators</term>
<term>Computer simulation</term>
<term>Computer software</term>
<term>Data structures</term>
<term>Defect model validation</term>
<term>Error analysis</term>
<term>Error classification</term>
<term>Image defect models</term>
<term>Image processing</term>
<term>Optical character recognition</term>
<term>Pattern recognition systems</term>
<term>Real scanned documents</term>
<term>Scanning</term>
<term>Signal distortion</term>
<term>Theory</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr">
<term>Théorie</term>
<term>Traitement image</term>
<term>Structure donnée</term>
<term>Distorsion signal</term>
<term>Calcul erreur</term>
<term>Balayage</term>
<term>Logiciel</term>
<term>Simulation ordinateur</term>
<term>Système reconnaissance forme</term>
<term>Reconnaissance optique caractère</term>
</keywords>
<keywords scheme="Wicri" type="topic" xml:lang="fr">
<term>Logiciel</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">In this paper, we consider the problem of evaluating character image generators that model distortions encountered in optical character recognition (OCR). While a number of such defect models have been proposed, the contention that they produce the desired result is typically argued in an ad hoc and informal way. We introduce a rigorous and more pragmatic definition of when a model is accurate: we say a defect model is validated if the OCR errors induced by the model are indistinguishable from the errors encountered when using real scanned documents. We describe four measures to quantify this similarity, and compare and contrast them using over ten million scanned and synthesized characters in three fonts. The measures differentiate effectively between different fonts and different scans of the same font regardless of the underlying text.</div>
</front>
</TEI>
<affiliations>
<list>
<country>
<li>États-Unis</li>
</country>
<region>
<li>Géorgie (États-Unis)</li>
<li>État de New York</li>
</region>
<settlement>
<li>Troy (New York</li>
</settlement>
<orgName>
<li>Institut polytechnique Rensselaer</li>
</orgName>
</list>
<tree>
<noCountry>
<name sortKey="Lopresti, D" sort="Lopresti, D" uniqKey="Lopresti D" first="D." last="Lopresti">D. Lopresti</name>
<name sortKey="Tomkins, A" sort="Tomkins, A" uniqKey="Tomkins A" first="A." last="Tomkins">A. Tomkins</name>
</noCountry>
<country name="États-Unis">
<region name="Géorgie (États-Unis)">
<name sortKey="Li, Y" sort="Li, Y" uniqKey="Li Y" first="Y." last="Li">Y. Li</name>
</region>
<name sortKey="Nagy, G" sort="Nagy, G" uniqKey="Nagy G" first="G." last="Nagy">George Nagy (informaticien)</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 002863 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 002863 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     Pascal:96-0139394
   |texte=   Validation of image defect models for optical character recognition
}}

Wicri

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024